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Abstract 

In this paper we present a topology optimization technique applicable to a broad range of flow design 
problems. We propose also a discrete adjoint formulation effective for a wide class of Lattice Boltzmann 
Methods (LBM). This adjoint formulation is used to calculate sensitivity of the LBM solution to several 
type of parameters, both global and local. The numerical scheme for solving the adjoint problem has 
many properties of the original system, including locality and explicit time-stepping. Thus it is possible 
to integrate it with the standard LBM solver, allowing for straightforward and efficient parallelization 
(overcoming limitations typical for discrete adjoint solvers). This approach is successfully used for the 
channel flow to design a free-topology mixer and a heat exchanger. Both resulting geometries being very 
complex maximize their objective functions, while keeping viscous losses at acceptable level. 
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1. Introduction 

The goal of the present work was to develop an optimization framework for a variety of multi-physics 
fluid flow problems. This framework is intended to provide consistent objective function improvement 
for any physical model implemented, while the numerical method is expected to allow for straightforward 
parallelization to achieve peak performance on clusters of modern nVidia CUDA capable GPUs. 

Optimal design in fluid flow problems has received considerable attention in recent years by both engineers 
and mathematicians. For a survey of CFD optimization methods we refer the reader to the book by 
Mohammadi and Pironneau [Tj. In general, flow domain optimization problems are solved with two main 
approaches called shape and topology optimization. 

The first, more traditional approach, consists in parametrization of the geometry and subsequent finding 
the parameters corresponding to the best objective function [2]. This limits the range of available shapes as 
topology cannot change during the optimization process. Nevertheless the approach remains very efficient 
and accurate in assessing the influence of fine details of geometry. 

The second approach defines the unknown shape using a global scalar field indicating presence of fluid or 
solid at each point of the domain (different scalar fields are used for this purpose, nevertheless the artificial 
porosity seems to be most popular mm)- This approach is very well established for solids and structures; 
for an extensive review we refer the reader to the book by Bendspe and Sigmund [5]. 

Discretization of a fluid problem is typically obtained by finite volume, finite element, or discontinuous 
Galerkin methods. These methods use fine body-fitted meshes, which may be difficult to generate for 
complex 3D geometries. Moreover these approaches are quite difficult to apply for multiphase flows with 
variable complex interfaces. In contrast Lattice Boltzmann Methods (LBM) are based on the concept of 
solving discrete Boltzmann equation on Cartesian grids. This in principle allows to tackle geometries of 
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extreme complexity (e.g., see application of LBM to the filtration problems 0 E|). LBM is proven to 
converge to incompressible Navier-Stokes equations for the low Mach number limit and in recent years has 
gained broad recognition for calculation of micro-fluids, multi-phase problems, and flows through porous 
media 00E]. For a comprehensive introduction to LBM we refer the reader to the book by Succi 0 . The 
geometry in LBM is described by switching specific nodes of the mesh on and off and applying a so-called 
Bounce Back boundary condition. This inherent immersed boundary approach is an ideal candidate for 
topology optimization technique, where very complex shapes are expected to appear. 

To achieve acceptable efficiency of the optimization algorithm for high-dimensional design spaces, the 
gradient of the objective function is needed. For this purpose we use a well established adjoint approach Emm 
HUH- In the literature, two main types of this approach can be found. The first is a continuous adjoint, which 
bases on equation dual to the original form of the problem (e.g., Navier-Stokes equations). The resulting 
dual equation is discretized, in most cases, by the same scheme as the original (primal) problem 0. The 
second approach is a discrete adjoint, which formulates equations dual to the already discretized form of the 
original problem muni- In the present work we use the latter approach, as it can be easily automated, at 
the same time providing the accurate value of the computed gradient. 

Despite advantages, the discrete adjoint exhibits drawbacks related to parallelization. This is because 
many implementations use Automatic Differentiation (AD) techniques to the whole residual calculation 
procedure. In most cases AD tools are incapable of handling MPI directive^)] which makes the resulting 
code inherently serial. In contrast, in the present work we apply AD only to small local procedures, which 
allows the adjoint code to be both efficient and parallel. 

The optimization in the context of Lattice Boltzmann Methods was addressed previously by Pingen, 
Evgrafov et al. mmm also with the use of adjoint approach. The adjoint for Lattice Boltzmann Methods 
was also considered for parameter identification by Tekitek et al. m, but not for geometry optimization. 
The problem of low-Reynolds number mixing was addressed earlier with both Finite Element Method by 
Andreasen et al. pT j . and with LB1VI by Al akhija et al. HU, however the results of the latter work are limited 
to the 2D channel flows and represent low topological complexity. In all of the above cases, the adjoint 
formulation is derived by hand for a specific model. 

In contrast the present work provides a fully parallel adjoint formulation for a wide class of Lattice 
Boltzmann schemes, and applies the method to 3D topological optimization in the context of the coupled 
flow and heat transfer problem. The optimization is taking advantage of a one-shot approach, which further 
lowers the numerical cost. The obtained 3D optimal heat exchanger has complex, almost fractal geometry. 

In Section [2] we describe a framework for a class of numerical LBM schemes and provide for them 
an adjoint formulation. In Section [3] we show details of the current LBM presenting: (i) the topology 
parametrization, (ii) the optimization algorithm and (iii) the implementation for the CUDA architecture. 
In Section [4] we formulate two distinct topology optimization problems and analyze the results obtained. 
We also investigate the parallel performance of the code, considering both weak and strong scaling. The 
two test cases concern: a low Raynolds number mixer and a free-topology heat exchanger. The final Section 
presents the conclusions of the presented work. 

2. Local Lattice Boltzmann Scheme and adjoint formulation 

To address a wider class of Lattice Boltzmann Methods and to preserve important parallelization prop¬ 
erties in the adjoint formulation, we define a subclass of Local Lattice Boltzmann Schemes (LLBS). The 
procedure of constructing an adjoint formulation presented in this paper can be applied to any Lattice 
Boltzmann Method that fits into the proposed subclass. 

The main features of LLBS are locality of the collision operator and reversibility of the streaming. These 
constraints still allow to have a different collision operator for each lattice node, provided that the operator 
does not use information from the surrounding nodes (apart from the streamed information). Most Lattice 


1 More modern AD software can in principle cope with MPI directives im but the process is still not completely automated 
and may produce a very inefficient code. 
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Boltzmann schemes, including single and multiple relaxation time schemes, fit into this framework. Many 
other schemes, including those for multi-phase flows, can be suitably reformulated for this framework by 
adding additional densities. These properties of LLBS allow for a unified formulation of the adjoint problem, 
at the same time preserving high parallel efficiency for different physical models. 

2.1. Local Lattice Boltzmann Scheme 

For the purpose of generalization we propose the following mathematical framework. Let us define lattice 
L as a finite set L = {0 ,L X — 1} x {0 ,L y — 1} x {0,..., L z — 1}. For a fixed M we define density 
space M M , with its elements describing the physical state at each node. The complete state of our system 
is expressed by the states at all the lattice nodes / : L —>■ M. m . The linear space of all possible states will 
be called a state space H = {/ : L —► R M }. Let us also define a parameter space Pcl p describing the 
allowable range for the optimization variables a £ P. 

The Lattice Boltzmann Method is characterized by complete separation of communication between nodes 
(called streaming) and calculations at each node (called collision). This feature allows for very efficient 
massive parallelization and nearly linear scalability PH- 
At each lattice node x £ L, the streaming is defined as: 

(Sf)j{x + e-j) = fj(x) for j = 1, ■ ■ • , M 

where ej denotes the shift of information related to the j-tli density from the node x to the neighbouring node 
x + e.j. The shift x + e.j is assumed to be periodic in all directions (further on, the increment/decrement shift 
will always mean addition/subtraction modulo L x , L y and L z ). This periodicity of the streaming operator 
is vital for simplicity of the adjoint formulation, yet by means limits the generality of the approach. 

The collision operator W x (f(x),a) is defined for each node x of the Lattice, its first argument denotes 
the set of densities, while the second the possible dependence on all optimization parameters a £ P. In 
general we may have a separate collision operator W x : R M —» for each node of the lattice x £ L. 

A single step n of the Local Lattice Boltzmann iterative scheme is finally defined as a composition of 
streaming and collision operators: 

f"- +1 {x + ej)=W x (f n (x),a) for j = 1,- ■ ■ ,M (1) 

It has to be stressed again, that W x depends only on the densities at a given node, but not on the 
densities at the neighbouring nodes. 

The stationary solution /, forms a fixed point of the iterative scheme ([lj: 

fj 0 + ej)= Wf (/( x), a) (2) 

It should be noted that this stationary solution / implicitly depends on all parameters a. 

It will be assumed now, that the objective function, we want to optimize, has the following general form: 

F(a)=J2 FX (f( x )) (3) 

X 

where node dependent function F x can be quite arbitrary. Other forms of <© are possible, yet this formula 
seems sufficient to represent typical optimization problems in hydrodynamics. 

2.2. Adjoint Lattice Boltzmann Scheme 

Before we go any further, we need to make a simple observation, based on periodicity of the shift operator 

S. 

Lemma 1. For any f £ M, we have "Yh xi fi{x) = J2xifi( x + &;)• From which it follows that for any 

a,b £ H: 

y a t (x)bi(x) = ^2 a,(a; + ei)6 *(x + ef) and ^ ai(x)b,(x + e») = ^ a*(a; - e,)6i(x) 

x,i x,i x,i x,i 
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It can be noted that, for a scalar product (a, b) = Y^ x i a,i(x)bi(x) defined on H , this lemma states that 
the streaming operator S is unitary, and we have both (a, b) = ( Sa,Sb) and (a, Sb) = (S^a, b). 

Now we can introduce the adjoint Lattice Boltzmann equation and prove the required properties. 

Theorem 1. If f £ H is a fixed point solution of the Local Lattice Boltzmann equation ©> the objective 
function F is defined by © and v € IHI is a solution of the equation: 


Vk 


dW x ~ 8F X 

{x-e k ) =^2vj(x)-^-(f(x),a) + -^~r(f( x )) for k= 1,- 


dfk 


df k 


,M 


then the gradient of the objective function can be calculated as 

dF ^ ( \ - 

= 2^vfix)-^-{f{x),a) 


da 


(4) 


(5) 


By © we have defined the fixed-point adjoint Lattice Boltzmann equation. It is important to notice, 
that v is of the same size as /, while the number of equations in Q is independent of the number of 
parameters a. Solved once, it can be used to calculate gradient of the objective function with respect to 
any number (and any type) of parameters by using the formula ©. 

Proof. Let us differentiate equation © with respect to a: 


k 

Now let us multipy equation Q by ^f{x) and sum it over all indices k: 

X>(z -«>§§(*) = I>M fe (/(*>,«)§§ 


( 6 ) 


dfk 


da 


The bracket is now substituted by equation 

£ Vk ( x - ek ) §§ ( x ) = £ v i (*) f §| (*+ e s) - ^ 


(f(x),a) 




k j 

and both sides are summed over all lattice nodes: 


dfk 


da 


£ v k ( x - e fc ) ^ (x) = ^ Wj (*) (^ + ) - £ V J (*) ^ (/(*), a) + £ ^- (/>)) (®) 


da 

x,k x,j 

Finally, using Lemma [lj one obtains: 




r \ fir \ a f if dfk ( ^ 

x,j x,k 


da 


which concludes the proof. 


□ 


Basing on equation © we propose a numerical scheme in which v(x) is obtained by explicit iterations: 

(7) 


dW x „ dF x 

v k + 1 ( x ~ek ) =£«"(a:)-^r-(/(*),Q!) + i rr(f( x )) 


dfk 


dfk 


We will call this scheme the Adjoint Lattice Boltzmann Scheme (ALBS). It has clearly the same form 
as the LLBS iteration formula ©. However the streaming of v acts in the opposite direction, while the 
corresponding adjoint collision operator is defined as: 

dW x 


Wk(vJ,a) = £ < 


dfk 


dF x 

(/.«)+ «^(/) 


dfk 


( 8 ) 
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3. Implementation of the Lattice Bolztmann Method 


In this study 

a set of 

19 vectors 

e is 

used for 3D 

flow 

simulations [151: 




ei 

= [0,0,0] 











e-i 

= [1,0,0] 

e3 = 

[-1,0,0] 

e 4 

= [0,1,0] 

65 

= [o, - 

1,0] e 6 = 

= [0,0,1] 

67 = 

= [0,0,-1] 

eg 

= [1,1,0] 

e9 = 

[-1,1,0] 

eio 

= [1,-1,0] 

eii 

= [-1, 

— 1,0] ei2 = 

= [1,0,1] 

ei3 = 

= [- 1 , 0 , 1 ] 

ei4 

= [1,0,-1] 

eis = 

[-1,0,-1] 

ei6 

= [0,1,1] 

ei7 

= [o, - 

1,1] eig = 

= [ 0 , 1 ,- 1 ] 

eig = 

= [0,-1,-1] 


For the heat transfer equation we use the first 7 of the above. For 2D cases the number of vectors is reduced 
to 9 for both the fluid flow and the heat transfer. 

In LLMS the local collision operator plays different role at different nodes of the lattice. It is both 
responsible for fluid dynamics at the interior nodes, as well as for enforcing the boundary conditions. There¬ 
fore three types of nodes can be distinguished: (i) interior nodes, (ii) inlet/outlet nodes, (iii) wall nodes. 
For inlet/outlet nodes we use simple boundary conditions by Zou and He [l9j . while for the wall nodes we 
use bounce back condition [9|. For the interior nodes we use a Forced Multiple Relaxation Time (FMRT) 
collision. 


3.1. Forced Multiple Relaxation Time collision 

For a classic LB scheme, a simple (BGK) collision operator is used: 

W BGK (/) = f + UJ (/eq — /) (9) 

where / eq are the equilibrium densities depending on the macroscopic quantities, while — is called the 
over-relaxation time. This over-relaxation time is directly related to the modelled viscosity coefficient T = 
0.5 +3^ (1 < oj < 2). For stabilization of the scheme an additional transformation was introduced [IS]. This 
approach applies a linear transformation of the density set /, projecting it to a moment space (corresponding 
to geometric moments of the velocity set). Different relaxation factors are used in each direction of the 
moment space. Let U denote the matrix of this transformation. For 3D flow simulation we use the matrix 
proposed in [IS] . We define the moment projection as m = Uf. The analogue for the BGK operator ([9]) in 
the moment space can be written as: 


W m (m) = m + T (m eq — m) 


( 10 ) 


where T is a diagonal matrix of relaxation factors. Multiplying both sides of (10) with U 1 we get MRT 
collision 

W MRT (/) = / + U~ 1 TU (/ eq — /) 


In most cases it is less computationally expensive to calculate m eq than / eq . This is why a intermediate 
form is further used: 

W MRT (/ ) = JJ-1 ^ + (/ — T ) (Uf - m e q) ) 

For our applications it is important to change the global quantities during the iteration process in order 
to: 


• enforce certain values like velocity, temperature, etc. at specific points, 

• optionally add a forcing term like mass force or heat source. 

We introduce these changes by differentiating between pre-collision equilibrium (/®{? e ) and post-collision 
(/post)- This defines the Forced Multi Relaxation Time (FMRT) collision: 

W FMRT (/) = U- 1 (m e P q ost + (I — T) (Uf - m£0 ) (11) 
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The pre-collision equilibrium f^ e is based on global variables (like u and p) of the density set / prior to 
collision while f^ st is based on desired post-collision values. The FMRT can be seen as a natural extension 
of the Exact Difference Method [20] of discretization of body forces in LBM. Let us assume that / eq depends 
on the velocity u and the density p. We can apply a body force R and integrate the dynamics of u with a 
simple Euler step: M post = rt pre + -Rdt. Applying this to the FMRT scheme we get: 

W™ RT (/) = /eqKre + iZdt, p) + U^TU (/ - / eq (u pre , P )) 


Let us now discard the suffix of u to obtain: 


yyFMRT (y) 


feq(u + i?dt, p) — /eq(«,P) 


+ W MRT (/) 


where the first bracket denotes the Exact Difference discretization of the force R. This approach can be 
used for any forcing terms, including: heat flux, solvent source as well as these which enforce prescribed 
velocity or temperature at selected points. 


3.2. Topology parametrization 

In order to implement topology optimization with classic optimization algorithms, one has to parametrize 
the design space. For this purpose at each node of the lattice L we define a parameter w, which refers to 
interpolation between a fluid node (w = 1) and a solid node (w = 0). For material properties like heat 
diffusivity, we use linear interpolation j3 = w/?h u id + (1 — w)/? so iid- 

In order to represent different dynamics in the solid and in the fluid regions, the most common method 
is to add a fictitious Darcy body force R = —K(w)u [3]0|. In practical implementation we use: 

W p ost = ^ P re dt /\ (?n)li pre = G(w')Up re 

where switching function G assumes value G(l) = 1 (no additional Darcy force exists in the fluid part) 
and G(0) = 0 (corresponds to u post = 0 in the solid part). The particular selection of function G can 
significantly affect the convergence of the optimization process, nevertheless this issue is not discussed here 
being a subject of a separate study. The common choice is a power function: K(w) = Kw 6 with 0 = 3 [TlfTP]. 

The final, optimized geometry should consist of either purely solid (w = 0) or purely fluid (w = 1) nodes. 
In this way a particular form of G becomes unimportant for the evaluation of final result. To reduce the 
number of nodes with intermediate values of w (between 0 and 1) during the optimization, we introduce a 
quadratic penalty function P: 

P = ^ w(l — w) (12) 

nodes 

This penalty is gradually added to the main objective function to force the optimizer to avoid non-physical 
values of w. 

To ensure that the final geometry has a physical meaning at a final stage of optimization, every w that is 
below a threshold p is set to 0, while everything above p is set to 1. Finally the threshold level p giving the 
best objective function value is selected from the interval [0,1]. Example of the dependence of the objective 
function on the threshold level can be observed in Fig. [5] 

3.3. Optimization 

For optimization two methods are used in the present paper. The first is the Method of Moving Asymp¬ 
totes (MMA) proposed by Svanberg [21]. In this approach, at each iteration of the optimization algorithm: 

• the primal solution is iterated until convergence 

• the adjoint scheme is iterated by the same number of iterations 

• the objective function and the gradient are evaluated 

• the optimization parameters are updated by MMA 
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The second method is a simple descent algorithm. The optimization problem is solved simultaneously 
with the primal and adjoint problem using a one-shot framework (l 11 I22| . A single iteration consists of: 

• one iteration of the primal problem 


• one iteration of the adjoint problem 

• one update of the parameters: 


a. 


n+1 




(13) 


where F denotes the present estimate of the gradient from the actual adjoint solution. The sufficiently 
small parameter £ is chosen to suppress possible oscillations (caused by the delayed response). 


3.4-. Programming and Parallel Implementation 

Both the primal and the adjoint Lattice Boltzmann Schemes are solved by the in-house solver CLB. This 
solver is highly parallel and tuned for nVidia CUDA architecture. It is also capable of running on multiple 
GPUs by using MPI directives. 

To achieve high performance and to allow for automatic adjoint formulation, we have used special 
code generation techniques. The code is generated for every model separately. The model is described by a 
set of tables defining: 

• Densities — Names of the densities and corresponding streaming vectors (e.g. /, and ei in Q) 

• Settings - Names of global settings that influence the model (e.g., v) 

• Globals - Names and properties of global integrals (e.g., Volume Flux, Heat Flux) 

• Quantities — Names of macroscopic quantities that are exported by the model (e.g., u , p, T) 

All the source code is generated basing on this description and on the generic text templates. For this 
purpose an in-house R-Template (RT) tool is used, based on the R programing language j53j. 

The model collision operator W is defined as a function in a C source file, operating on the defined 
densities. As the code includes matrix multiplications and other repeatable operations, it is written partly 
using the symbolic algebra. These symbolic formulas are subsequently expanded by the R-Template tool. 

For adjoint formulation, the model information is augmented with adjoint densities. Then the model’s 
dynamics is differentiated using Automatic Differentiation (AD) tool TAPENADE [24] (for reverse differen¬ 
tiation of operator W). The produced code is subsequently used in implementation of equation ([8]). 

The complete code generation and compilation procedure is presented in Fig. [T] 


4. Results 

To verify the performance of the proposed optimization approach, two test cases were considered. The 
first test case deals with maximizing of the low Reynolds number mixing. The second test case concerns 
coupled heat and mass transfer, in which the convected heat is maximized. In both cases the new shapes 
are created in the channel with square cross-section. The topology optimization algorithm generates these 
shapes in a selected volume of this channel (called a design region). 

The fluid flow is simulated using a standard 19-density 3D MRT [IS], while for the heat transfer a 7- 
density single relaxation time BGK is used. The temperature field is always a passive scalar and does not 
influence the fluid flow. 

The walls of the channel are adiabatic with a no-slip boundary condition for velocity. At the inlet and 
at the outlet the standard Pressure Inlet/Outlet boundary conditions by Zou and He J9] are imposed. The 
pressure at the outlet is set to the reference level of 0 (corresponding to the LB density 1), while the pressure 
at inlet is set equal to the desired pressure drop. The additional constraint was imposed on the inlet, limiting 
(out of stability reasons) the velocity to 0.05. During the optimization process in both cases the Reynolds 
number remains in the range of 100-250, being strongly dependent on the shape of the generated geometry. 
Summary of settings for both cases can be found in Table [l] 
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Figure 1: Scheme of code generation and compilation in the CLB code 


Case 

Lattice size 

No. of nodes 

DoF 

Span of the 
design region 

A p 

V 

^fluid 

^solid 

Mixing 

Heat exchanger 

450 x 50 x 50 
352 x 50 x 50 

1.12X10 6 

0.88X10 6 

29.2xl0 6 

22.8x10® 

61-390 

101-252 

0.016666 

0.03 

0.02 

0.01 

0.003 

0.003 

0.003 

1 


Table 1: Summary of the setting in both test cases 


4-1. Low Reynolds number mixing 

The mixing test case consists of a square channel of height D and length 9 D. The design region of length 
~ 6.5 D is located in the middle section of the channel. At the inlet discontinuous temperature distribution 


was prescribed (T = — 1 for y < 25 and T = 1 for y > 25 — see Fig. 2a). As the heat conduction mimics 
the diffusion, the evolution of temperature can be used to represent mass mixing process. To evaluate the 
quality of mixing we use a flux of an award function: 


Mixing = f (u ■ n)(l — T)(l + T)dS = f (u ■ n)(l — T 2 )dS 

J outlet J outlet 


(14) 


This simple award function brings a balance between low variance of temperature and high volume of the 
flow. The pressure drop of A p = 0.016666 was set between the inlet and the outlet. The viscosity was taken 
as v = 0.02, while both the solid and the fluid heat diffusivity coefficients were set to a S oiid = ctfluid = 0.003 
(see Table [I]). 

If there is no obstacle in the channel, the temperature profile is evolving by simple heat conduction. The 
resulting distribution of temperature at the outlet is illustrated in Fig. |2b| This particular distribution has 
a standard deviation of a = 0.761 (see Table [2]) while the value of the objective function (14) is around 50. 

The convergence of the one-shot optimization procedure can be observed in Fig. [3} The final value of the 
objective is around 110 (the standard deviation of the temperature distribution is a = 0.162). The resulting 
geometry consists mostly of nodes with values of w > 0.9. As the diffusivity coefficients of the solid and the 












































Figure 2: Temperature on Q inlet of the channel ([b]) outlet of the channel without any obstacles and ([c]) 
outlet of the channel with the optimal geometry. 


fluid are the same and value of w = 0.9 seems sufficient to completely stop the flow, the sensitivity of the 
objective to parameter w becomes negligible. 

Application of the threshold, marginally reduces the objective. Dependence of the objective on the 
threshold level is shown in Fig. [5] In fact the standard deviation of the temperature is reduced now to 
a = 0.151, but at the same time the flow rate is reduced (see Tablej2|. The distribution of temperature at 
the outlet for the optimized geometry is illustrated in Fig. [2c] 

The MMA optimization gives just slightly better results (see Table [2]). On the other hand the MMA 
approach needs around 6 million of iterations, whereas one-shot method needs around 1 million. This 
difference occurs, because in MMA optimization we iterate the solution and the adjoint until converged 
in each optimization iteration. In the one-shot approach we use the the imprecise adjoint solution for the 
update, saving a lot of iterations at the early stage of the optimization. 

The final geometry and the temperature contour plots in the channel are presented in Fig. [4] 


4-2. Heat exchanger 

The goal of this test case is to intensify the heat exchange between a heating plate and the cooling fluid 
in a low Reynolds number channel flow. 

The heat exchanger test case consists of a square channel of the height D and the length 7 D (See Fig. [6]) . 
In the middle of the channel, a square D x D heating plate is placed at the bottom. The design region is 
the middle section of the pipe (3D). The heating plate has a fixed temperature of T = 1, while the fluid at 
inlet has T = 0. The goal is to increase the heat flux convected at the outlet. 


Heat flux - 


(u ■ n)TdS 


(15) 


/ outlet 


The pressure drop of A p = 0.03 was set between the inlet and the outlet. The viscosity coefficient was 
assumed as v = 0.01, while the solid and the fluid heat diffusivity coefficients were taken as a so iid = 1 and 
afluid = 0.003 respectively (Prandtl number = 3.33). 

If there is no obstacle in the channel, the heat exchange is marginal giving value of the objective func¬ 


tion (15) of around 2.4 and the mean temperature of 0.02. 


To make a fair comparison of the performance of the optimized geometry, a series of simple geometries 
were first investigated. Geometries consisting of one to seven evenly spaced parallel vertical fins of width of 
3-10 lattice nodes were tested with the same flow settings. An additional constraint was adopted to preserve 
a gap of at least 3 lattice nodes between the fins. The results for all fin geometries can be seen in Fig. [7] 
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Figure 3: Convergence of optimization of free-topology 
steepest descent algorithm (see 3.31. 


mixer. 


Descent speed denotes the speed £ in the 



Figure 4: Optimal free-topology mixer, generated by the one-shot algorithm. Geometry of the mixer and 
contour plots of temperature in the square channel. 
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Figure 5: Influence on the cut-off level on objective in the mixing optimization test case. 



Figure 6: Heat exchanger test case geometry 


The best heat transfer was achieved for 6 fins of width 4 (see Table [2]). For such configuration the value of 
the objective is 25.2. 

In contrast the one-shot topology optimization algorithm gives a result of around 31. However after 
application of the threshold this value reduces to 28 (see Table [2]). This behaviour is caused by a region of 
w > 0.9 acting as a porous obstacle on top of the resulting geometry. To further improve the quality of the 
result penalization of the parameter w is used. 

We add the function P (121 to the objective, and exponentially increase the weight of this function in 
the overall objective. The convergence of this process can be observed in Fig. [8j As can be clearly seen 
the penalization does not reduce the true objective function significantly. On the other hand function P 
is reduced efficiently, and the resulting geometry is more binary. Finally we loose less of the performance 
applying the threshold then we did in the case of optimization without penalization (compare Opt and 
Opt+T to Opt+P and Opt+P+T in Table [2]). 


The optimized geometry is shown in Figure 10 The increase in the objective function is 17% and in 


the mean temperature 21%, compared to the best fin geometry. When compared to the empty channel, the 
objective is 12 times higher, and the mean temperature 22 times higher. 

Using the MMA optimization provides us with even better result of the objective function around 33, but 
is reduced to 27.8 after applying the threshold. As in the earlier case, the one-shot optimization outperforms 
the MMA approach. Convergence of MMA optimization is shown in Fig. [9] The number of iterations needed 
for a convergence of the solution (without the optimization) is around 40 thousand, for one-shot optimization 
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Penalty function P Objective / HeatFlux 



Width of fins 


Figure 7: Objective function for a heat exchanger consisting of an set of parallel fins. 



Iterations [millions] 


Figure 8: Convergence of optimization of free-topology heat exchanger with the one-shot algorithm. Penalty 
denotes the weight of function P (see ©) in the objective function. 
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Figure 9: Convergence of optimization of free-topology heat exchanger with MMA algorithm. In each 
iteration of MMA algorithm (separated by vertical lines) the solution of the primal and adjoint problem is 
iterated until converged (see 3.31. 


(with penalization) it is 0.4 million, and for MMA optimization it is around 3.5 million. Additionally there 
is no simple method of adding the penalty gradually in MMA, which could prevent the drop of the objective 
function caused by the application of the threshold on the geometry. 

4.3. Parallel performance 

Parallel performance of the final code was investigated in a series of numerical experiments. The per¬ 
formance of any parallel code is strongly dependent on the size of the problem and the ratio between the 
calculations and communication. The latter however is directly related to the ratio of volume to surface of 
the part of the problem that each processor/GPU is calculating. We distinguish two type of scaling: weak 
and strong. Weak scaling measures speedup of the code, if the size of the subproblem assigned to each 
processor is the same and fixed. Strong scaling on the other hand measures the speedup with a fixed size of 
the overall problem — causing the problem to be divided to smaller parts for higher number of processors. In 
most cases, as the size of the subproblems decreases, the communication starts to dominate the calculations. 
Such system reaches a level at which adding new processors/GPUs, doesn’t improve the performance. Such 
level is sometimes called saturation. We selected three meshes: 

• ”3M” — 320 x 100 x 100 - 3.2xl0 6 elements — 83xl0 6 DoF 

• ”400K” — 160 x 50 x 50 - 400xl0 3 elements — 10x10® DoF 

• ” 12M” — 320 x 200 x 200 - 12.8 xlO 6 elements — 332xlO 6 DoF 

A series of calculations were done on 1-6 nodes with 1-8 GPUs per node (1-48 GPUs in total). Cal¬ 
culations were carried out on nodes of ZEUS supercomputer at the Cyfronet computing center. Each 
computational node of the cluster is equipped with 2 Intel Xeon E5645 processors and 8 Nvidia Tesla M2090 
GPU units. Weak scaling was investigated for ”3M” and ”400K” meshes, by running the CLB code for 
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Figure 10: Geometry of free-topology heat exchanger. rta| front and dbh back. 


Table 2: Summary of the results. SD - Standard deviation; Opt - Optimized with one-shot method; 
MMA Optimized with MMA algorithm; T — threshold applied, resulting in a binary (w = 0 and w = 1) 
geometry; P — penalty P (see <H§) gradually added to the objective function; Fins — Optimal fin heat 
exchanger 



Case 

Objective 

Velocity 

Temperature 

Topology parameter w in design region 

Iterations 

Mean 

SD 

Mean 

SD 

0 

(0,0.9) 

[0.9,0.99) 

[0.99,1) 

1 


Empty 

49.266 

0.051 

0.031 

0.000 

0.763 

— 

— 

— 

— 

100% 

0.04x10 s 

Sh 

MMA 

114.508 

0.050 

0.030 

0.041 

0.068 

<1% 

1% 

6% 

6% 

87% 

6x10® 

.a 

MMA+T 

111.863 

0.050 

0.030 

0.041 

0.175 

2% 

— 

— 

— 

98% 

— 

§ 

Opt 

113.381 

0.050 

0.031 

0.000 

0.094 

— 

— 

6% 

7% 

87% 

1x10® 


Opt+T 

110.383 

0.050 

0.030 

0.000 

0.200 

3% 

— 

— 

— 

97% 

— 


Empty 

2.376 

0.051 

0.029 

0.020 

0.049 

— 

— 

— 

— 

100% 

0.03x10® 


Fins 

25.202 

0.030 

0.017 

0.367 

0.092 

16% 

— 

— 

— 

84% 

0.03x10® 

CD 

MMA 

33.362 

0.034 

0.020 

0.424 

0.101 

10% 

14% 

7% 

4% 

65% 

3.5x10® 

§ 

MMA+T 

27.883 

0.032 

0.017 

0.378 

0.062 

25% 

— 

— 

— 

75% 

— 

•3 

Opt 

31.032 

0.025 

0.009 

0.533 

0.050 

25% 

15% 

3% 

9% 

48% 

0.4x10® 

a 

Opt+T 

28.226 

0.026 

0.013 

0.467 

0.065 

40% 

— 

— 

— 

60% 

— 


Opt+P 

30.955 

0.027 

0.010 

0.502 

0.051 

31% 

<1% 

2% 

9% 

57% 

0.4x10® 


Opt+P+T 

29.447 

0.029 

0.014 

0.446 

0.060 

32% 

— 

— 

— 

68% 

— 
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Parallel scaling 



Figure 11: Weak and strong scaling of Primal and Adjoint calculations in CLB code. 


a mesh size proportional to the number of GPUs (in this case 3.2xl0 6 and 400xl0 3 elements per GPU). 
Strong scaling was checked for ”3M” and ”12M” meshes, by running the CLB code, for a fixed size mesh. 
The results are in Table [3] and in Fig. 0 All calculations were performed in a single precision. The size 
of ”12M” mesh is slightly above the upper limit for simulation of fluid with temperature and adjoint on a 
single Nvidia Tesla M2090 (6 GB GDDR5). 

The results clearly demonstrate that the code has nearly linear weak scaling for 3.2xl0 6 elements per 
node. Strong scaling saturates at 15 GPUs for ”12M” mesh, and at 10 GPUs for ”3M” mesh. The interesting 
feature of good parallelism of the proposed adjoint formulation is that the adjoint calculations are closer to 
the linear scaling, than the primal calculations. This is caused by a average higher number of operations 
per node, while keeping the communication at exactly the same level. The speed ratio between adjoint 
and primal is consistently around 2.65, which is in good agreement with theoretical estimates of 2-3 (as 
derivative of one multiplication is two multiplications and one addition). 

5. Conclusions 

In the present work we showed an efficient technique for addressing topology optimization in complex 
mesoscopic flow problems. This approach is based on solving the problem with Lattice Boltzmann Method, 
calculation of the sensitivity with Adjoint Lattice Boltzmann, and optimization with one-shot steepest 
descent method, or MMA. 

This approach was successfully used for optimization of free-topology mixer and heat exchanger (see 
Table [2}. 

In the present work we showed an efficient technique for addressing topology optimization in complex 
mesoscopic flow problems. This approach is based on solving the problem using Lattice Boltzmann Method, 
while the sensitivities are evaluated by the Adjoint Lattice Boltzmann. The optimization process is carried 
out with one-shot steepest descent method, or the Method of Moving Asymptotes. This approach was 
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Table 3: Summary of performance tests. 


GPUs 


Strong scaling 



Weak scaling 


12M mesh 

3M mesh 

3M mesh 

400K mesh 

Primal 

Adjoint 

Primal 

Adjoint 

Primal 

Adjoint 

Primal 

Adjoint 

1 

— 

— 

298 (—) 

109 (—) 

298 (—) 

109 (—) 

289 (—) 

104 (-) 

2 

604 (—) 

219 (—) 

582 ( 2%) 

211 ( 4%) 

585 ( 2%) 

217 (< 1%) 

549 ( 5%) 

203 ( 2%) 

3 

871 (—) 

324 (<1%) 

837 ( 6%) 

305 ( 7%) 

874 ( 2%) 

325 (<1%) 

819 ( 6%) 

304 ( 2%) 

4 

1162 (—) 

432 (—) 

1089 ( 9%) 

411 ( 6%) 

1161 ( 3%) 

432 ( 1%) 

1079 ( 7%) 

403 ( 3%) 

5 

1471 (—) 

546 (—) 

1184 (21%) 

493 (10%) 

1455 ( 2%) 

542 (<1%) 

1364 ( 6%) 

506 ( 2%) 

6 

1692 ( 3%) 

641 ( 1%) 

1557 (13%) 

588 (10%) 

1735 ( 3%) 

649 ( 1%) 

1601 ( 8%) 

601 ( 3%) 

7 

1787 (12%) 

733 ( 3%) 

944 (55%) 

646 (16%) 

2018 ( 3%) 

750 ( 2%) 

1860 ( 8%) 

701 ( 3%) 

8 

2271 ( 2%) 

830 ( 4%) 

1957 (18%) 

804 ( 8%) 

2307 ( 3%) 

865 ( 1%) 

2110 ( 9%) 

797 ( 4%) 

9 

2418 ( 7%) 

926 ( 5%) 

1880 (30%) 

849 (14%) 

2577 ( 4%) 

971 ( 1%) 

2365 ( 9%) 

897 ( 4%) 

10 

2797 ( 4%) 

1063 ( 2%) 

2014 (33%) 

910 (17%) 

2874 ( 4%) 

1078 ( 1%) 

2649 ( 8%) 

997 ( 4%) 

12 

3235 ( 7%) 

1215 ( 6%) 

2452 (32%) 

1140 (13%) 

3442 ( 4%) 

1297 ( 1%) 

3181 ( 8%) 

1194 ( 4%) 

14 

2639 (35%) 

1415 ( 7%) 

1256 (70%) 

1148 (25%) 

3952 ( 5%) 

1420 ( 7%) 

3081 (24%) 

1384 ( 5%) 

15 

3963 ( 9%) 

1541 ( 5%) 

2214 (51%) 

1252 (24%) 

4293 ( 4%) 

1620 ( 1%) 

3869 (11%) 

1490 ( 4%) 

16 

4105 (12%) 

1614 ( 7%) 

2186 (54%) 

1468 (16%) 

4561 ( 4%) 

1724 ( 1%) 

4123 (11%) 

1583 ( 5%) 

18 

4427 (15%) 

1748 (10%) 

2394 (55%) 

1544 (22%) 

5148 ( 4%) 

1941 ( 1%) 

4657 (11%) 

1788 ( 4%) 

20 

4384 (25%) 

2034 ( 6%) 

2258 (62%) 

1605 (27%) 

5716 ( 4%) 

2157 ( 1%) 

4976 (14%) 

1981 ( 4%) 

21 

2907 (52%) 

2020 (11%) 

1577 (75%) 

1489 (35%) 

5902 ( 6%) 

2203 ( 4%) 

4559 (25%) 

2075 ( 5%) 

24 

4943 (29%) 

2309 (11%) 

2292 (68%) 

1887 (28%) 

6855 ( 4%) 

2585 ( 1%) 

6087 (12%) 

2369 ( 5%) 

25 

4406 (39%) 

2458 ( 9%) 

2126 (71%) 

1849 (32%) 

7123 ( 4%) 

2696 ( 1%) 

5390 (25%) 

2468 ( 5%) 

28 

4124 (49%) 

2718 (10%) 

1916 (77%) 

1800 (41%) 

7910 ( 5%) 

2905 ( 5%) 

5127 (37%) 

2720 ( 6%) 

30 

4725 (46%) 

2865 (12%) 

2185 (76%) 

1931 (41%) 

8527 ( 5%) 

3235 ( 1%) 

6454 (26%) 

2960 ( 5%) 

32 

3952 (57%) 

3088 (11%) 

1933 (80%) 

1838 (48%) 

9051 ( 5%) 

3136 (10%) 

6088 (34%) 

3040 ( 8%) 

35 

4178 (59%) 

3362 (11%) 

2232 (79%) 

2022 (47%) 

9866 ( 5%) 

3782 ( 1%) 

5828 (42%) 

3439 ( 5%) 

36 

4346 (58%) 

3306 (15%) 

2115 (80%) 

2009 (49%) 

10169 ( 5%) 

3889 ( 1%) 

6380 (39%) 

3526 ( 6%) 

40 

4408 (62%) 

3858 (11%) 

1889 (84%) 

1869 (57%) 

11296 ( 5%) 

4322 ( 1%) 

6433 (44%) 

3924 ( 5%) 

42 

4790 (61%) 

3820 (16%) 

2158 (83%) 

2048 (55%) 

11850 ( 5%) 

4528 ( 1%) 

6685 (45%) 

4124 ( 5%) 

48 

4743 (66%) 

4388 (15%) 

2080 (85%) 

2038 (61%) 

13385 ( 6%) 

5169 ( 1%) 

7876 (43%) 

4718 ( 5%) 
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successfully used for optimization of a free-topology mixer and a heat exchanger (see Table [ 2 ]). The resulting 
optimal geometry has a very complex, almost fractal shape. 

It was also demonstrated that the code has nearly linear weak scaling, while the strong scaling saturates 
at 10-15 GPUs. This was made possible by application of various code generation and parallel program¬ 
ming techniques. Proposed adjoint problem formulation has even better parallel properties, keeping linear 
scalability even for smaller meshes. 

Overall, we found Adjoint Lattice Bolzmann to be a powerful technique for topology optimization in 
mesoscopic multi-physics flow problems. Using this technique for inverse design, optimal control and time- 
dependent problems, needs further research. 
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